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Abstract 

Starting in 2008 the HI and ZEUS experiments have been combining their 
data in order to provide the most complete and accurate set of deep-inelastic 
data as the legacy of HERA. The present review presents these combinations, 
both published and preliminary, and explores how they have been used to give 
information on the structure of the proton. The HERAPDF parton distribution 
functions (PDFs) are presented and compared with other current PDFs and 
with data from the Tevatron and LHC colliders. 

1 Introduction 

HERA was an electron(positron)-proton collider located at DESY, Hamburg. It ran in two phases 
HERA-I from 1992-2001 and HERA-II 2003-2007. Two similar experiments, HI and ZEUS, took data. 
In HERA-1 running each experiment collected ~ 100pb _1 of e + p data and ~ 15pb _1 of e~p data 
with electron beam energy 27.5GeV and proton beam energies 820, 920 GeV. In HERA-II running each 
experiment took ~ 140pb _1 of e + p data and ~ 180pb _1 of e~p data with the same electron beam energy 
and proton beam energies 920 GeV. In addition to this, before the shut-down in 2007, each experiment 
took ~ 30pb~ 1 of data with reduced proton beam energies 460, 575 GeV. 

Deep inelastic lepton-hadron scattering data has been used both to investigate the theory of the 
strong interaction and to determine the momentum distributions of the partons within the nucleon. The 
data from the HERA collider now dominate the world data on deep inelastic scattering since they cover 
an unprecedented kinematic range: in Q 2 , the (negative of the) invariant mass squared of the virtual 
exchanged boson, 0.045 < Q 2 < 3 x 10~ 5 ; in Bjorken x, 6 x 10~ 7 < x < 0.65. Futhermore, 
because the HERA experiments investigated e + p and e~p, charge current(CC) and neutral current (NC) 
scattering, information can be gained on flavour separated up- and down-type quarks and antiquarks and 
on the gluon- from its role in the scaling violations of perturbative quantum-chromo-dynamics. From 
2008, the HI and ZEUS experiments began to combine their data in order to provide the most complete 
and accurate set of deep-inelastic data as the legacy of HERA. Data on inclusive cross-sections have been 
combined for the HERA-I phase of running and a preliminary combination has been made also using the 
HERA-II data. This latter exersize also includes the data run at lower proton beam energies in 2007. 
Combination of F$ c data is also underway, and combination of F| b data and of jet data is foreseen. The 
HERA collaborations have used these combined data to determine parton distribution functions (PDFs). 
These analyses had resulted in the HERAPDF sets. The present review concentrates on the information 
on proton structure which has been gained from these HERA data. 



2 Formalism 

In the quark parton model deep inelastic lepton-hadron scattering is pictured as in Fig. [T] where 1,1' 
represent leptons (lepton is taken to include antileptons, unless it is necessary to distinguish them), and 
N represents the nucleon. The associated four vectors are k, k' for the incoming and outgoing leptons 
respectively, and p for the target (or incoming) nucleon. The process is mediated by the exchange of a 
virtual vector boson, V*(7, W or Z), with four momentum given by 



q = k-k'. 




Fig. 1 : Schematic diagram of lepton-hadron scattering in the quark-parton model 



Various Lorentz invariants are useful in the description of the kinematics of the process: 

s= (p + k) 2 , 

the centre of mass energy squared for the Ip interaction, 

Q 2 = ~q\ 

the (negative of) the invariant mass squared of the virtual exchanged boson, 

x = Q 2 /2p.q, 

the Bjorken x variable, which is interpreted in the quark-parton model as the fraction of the momentum 
of the incoming nucleon taken by the struck quark, and 

y=p.q/p.k, 

which gives a measure of the amount of energy transferred between the lepton and the hadron systems. 
Note that (ignoring masses), 

Q 2 = sxy, 

so that only two of these quantities are independent. Finally the centre of mass of the V*p system (or 
equivalently the invariant mass of the final state hadronic system) is often denote by W 

W 2 = (q+p) 2 . 

Neutral current (NC) deep inelastic scattering is mediated by 7 and Z exchange and the NC deep 
inelastic e^p scattering cross sections can be expressed as 

± Q A x ± Y_ y 2 

*™ = w>T^ = F2T % xF * - t + Fl ' (1) 

where the electromagnetic coupling constant a, the photon propagator and a helicity factor are absorbed 
in the definition of a reduced cross section a, and Y± = 1 db (1 — y) 2 . The structure functions F2, Fl 
and xF% are given by 

F 2 = F] - K Z v e ■ Ff + K 2 z (v 2 e + a 2 e ) ■ if , 
F L = Fl-K Z v e -Fl Z + K 2 z {v 2 e +a 2 e )-F[ , 

xF% = K Z a e ■ xF% Z — k 2 z ■ 2v e a e ■ xF z . (2) 



where v e and a e are the vector and axial- vector weak couplings of the electron and kz (Q 2 ) = Q 2 /[(Q 2 + 
M§)(4sin 2 6w cos 2 9w)]- At low Q 2 , the contribution of Z exchange is negligible and xF% = 0, F2 = 
F2 , Fl = F2 and a = F2 — y 2 Fi/Y + . The contribution of the term containing the structure function 
Fl is only significant for large values of y. 

In the Quark Partem Model (QPM), Fl = 0, and the other strcuture functions are given by 

{Fq,Fq Z ,F%) = [(e 2 u ,2e u v u ,v 2 u + a 2 u )(xU + xU) + (e 2 d ,2e d v d ,v 2 + a 2 )(xD + xD)] , 
(xF^ Z ,xF z ) = 2{(e u a u ,v u a u )(xU - xU) + (e d a d ,v d a d )(xD - xD)} , (3) 
such that at low Q 2 

Fq = \e\(xU + xU) + e\xD + xD)] , (4) 

where e u , denote the electric charge of up- or down-type quarks while v Uyd and a U:d are the vector and 
axial-vector weak couplings of the up- or down-type quarks. Here xU, xD, xU and xD denote the sums 
of up-type, of down-type and of their anti-quark momentum distributions, respectively. In the QPM these 
ditributions are functions of Bjorken x only, and not also of Q 2 as they would be in full generality- this is 
what is meant by Bjorken scaling. Below the b quark mass threshold, these sums are related to the quark 
distributions as follows 

xU = xu + xc , xU = xu + xc, xD = xd + xs , xD = xd + xs , (5) 

where xs and xc are the strange and charm quark distributions. Assuming symmetry between sea quarks 
and anti-quarks, the valence quark distributions result from 

xu v = xU — xU , xd v = xD — xD . (6) 

Charge current (CC) deep inelastic scattering is mediated by W + and W~ exchange and the CC 
deep inelastic e^p scattering cross sections can be expressed as 

2 

(7) 



°cc = ~~F?r 



M^ + Q 2 



M 2 W 



where analogously to EqQ] 



*cc = "if W T ^rxW£ - y -Wt- (8) 



2 2 2 3 2 

In the QPM, Wf~ = 0, and the CC structure functions represent sums and differences of quark and 
anti-quark-type distributions depending on the charge of the lepton beam: 

W 2 + = xU + xD, xW^ = xD-xU, W 2 ~ = xU + xD , xW^ = xU - xD . (9) 

From these equations it follows that 

,7+ = xU + (1 -y) 2 xD, or" = xU + (1 -y) 2 xD . (10) 

Therefore the NC and CC measurements may be used to determine the combined sea quark distribution 
functions, xU and xD, and the valence quark distributions, xu v and xd v . 

Perturbative QCD extends the formalism of the QPM such that the partem momentum distributions 
(PDFs) become functions of Q 2 as well as x. However this scaling violation induces only a logarithmic 
dependence on Q 2 , as described by the DGLAP equations HHU. The DGLAP equations are coupled 
equations for the change of the quark, antiquark and gluon densities as In Q 2 changes 

d fqi(x,Q 2 )\ _ a s (Q 2 )^rid£ 



dlnQ 2 \g{x,Q 2 ) J 2tt ^J x £ 



P M ,(f,a,(Q 2 )) P^(f,a s (Q 2 ))\ fq^,Q 2 ] 
P sg ,(f,a s (Q 2 )) P 99 (l,a s (Q 2 ))J\g(^Q 2 ) 



(11) 



where the qj are taken to include both quarks and antiquark distributions. The splitting functions are 
expanded as power series in the strong coupling a s , 



P qg (z,a s ) = pW(*) + £pa>( z ) + ... 



Pgq Q!s 



Oi.t 

2tt" J q:! 



and are calculable within pQCD. Thus the gluon momemtum distribution influences the quark distribu- 
tions through its contribution to their scaling violations, and the gluon PDF is determined by analysing 
the Q 2 dependence of the data. 

To leading order in pQCD the equations for the structure functions in terms of the PDFs are still 
given by the QPM expressions. However beyond leading order a convolution of partem distributions and 
QCD-calculable coefficient functions is necessary. 



F 2 (x,Q 2 



(12) 



where, e 2 = J2i e ?> an d trie sums run over all active quark and antiquark flavours. C q an C g are the 
coefficient functions, which may also be expanded as power series in a s , 

C q (z,a s ) = 6(l-z) + ^-C 1 q (z) + ... 
C g (z,a s ) = ^-C l g (z) + . . . . 

In the QPM the transverse momentum of the partons is assumed to be zero and one of the consequences 
of this for spin \ quarks is that the longitudinal structure function (Fl = F 2 — 2xF\) is zero. However 
this is no longer true beyond leading order, and Fl is given by. 



F L (x,Q 2 



(13) 



Thus the gluon distribution also influences the longitudinal structure function particularly at low x. 



3 Data sets 

The deep inelastic ep scattering cross sections depend on the centre-of-mass energy s and on two other 
independent kinematic variables, usually taken to be Q 2 and x. The salient feature of the HERA col- 
lider experiments is the possibility to determine the x and Q 2 from the scattered electron, or from the 
hadronic final state, or using a combination of the two. The choice of the most appropriate kinematic 
reconstruction method for a given phase space region is based on resolution, measurement accuracy and 
radiative correction effects and has been optimised differently for the two HERA experiments HI and 
ZEUS, as described in the original publications. The use of different reconstruction techniques by the 
two experiments contributes to improved accuracy when the data sets are combined, since although the 
detectors were built following similar physics considerations they opted for different technical solutions, 
both for the calorimetric and the tracking measurements. Thus the experiments can calibrate each other. 
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Table 1: HI and ZEUS data sets used for the combination. 

3.1 The combined inclusive HERA-I data set 

The inclusive cross-sections data collected by each experiment in the HERA-I running period have been 
combined f.5). A summary of the data used in this analysis is given in TableQ] The NC data cover a wide 
range in x and Q 2 . The lowest Q 2 > 0.045 GeV 2 data come from the measurements of ZEUS using the 
BPC and BPT 111 111 1211 - The Q 2 range from 0.2 GeV 2 to 1.5 GeV 2 is covered using special HERA runs, 
in which the interaction vertex position was shifted forward allowing for larger angles of the backward 
scattered electron to be accepted ||6][T3]- The lowest Q 2 for the shifted vertex data was reached using 
events in which the effective electron beam energy was reduced by initial state radiation [6]. Values of 
Q 2 > 1.5 GeV 2 are measured using the nominal vertex settings. For Q 2 < 10 GeV 2 , the cross section 
is very high and the data were collected using dedicated trigger setups J61IT41. The highest accuracy of 
the cross-section measurement is achieved for 10 < Q 2 < 100 GeV 2 HE). For Q 2 > 100 GeV 2 , the 
statistical uncertainty of the data becomes relatively large. The high Q 2 data included here were collected 
with positron ||8][l0][l4l[T8l and with electron ||9][T6] beams. The CC data for e + p and e~p scattering 
cover the range 300 < Q 2 < 30000 GeV 2 HDHMISHniED. 

The full details of the combination procedure are given in ref [5]. The combination of the data 
sets uses the x 2 minimisation method described in [6|. The x 2 function takes into account the correlated 
systematic uncertainties for the HI and ZEUS cross-section measurements. Global normalisations of 
the data sets are split into an overall normalisation uncertainty of 0.5%, common to all data sets, due to 
uncertainties of higher order corrections to the Bethe-Heitler process used for the luminosity calculation, 
and experimental uncertainties which are treated as correlated systematic sources. Some sources of point- 
to-point correlated uncertainties are common for CC and NC data and for several data sets of the same 
experiment. The systematic uncertainties were treated as independent between HI and ZEUS apart from 
the 0.5% overall normalisation uncertainty. All the NC and CC cross-section data from HI and ZEUS 
are combined in one simultaneous minimisation. Therefore resulting shifts of the correlated systematic 
uncertainties propagate coherently to both CC and NC data. 

There are in total 110 sources of correlated systematic uncertainty, including global normalisa- 
tions, characterising the separate data sets. None of these systematic sources shifts by more than 2 a 
of the nominal value in the averaging procedure. The absolute normalisation of the combined data 
set is to a large extent defined by the most precise measurements of NC e + p cross-section in the 
10 < Q 2 < 100 GeV 2 kinematic range. Here the HI Q and ZEUS HH results move towards each 
other and the other data sets follow this adjustment. 



The influence of several correlated systematic uncertainties is reduced significantly for the aver- 
aged result. For example, the uncertainty due to the HI LAr calorimeter energy scale is halved while the 
uncertainty due to the ZEUS photoproduction background is reduced by a factor of 3. There are two main 
reasons for thess significant reductions. Since HI and ZEUS use different reconstruction methods similar 
systematic sources influence the measured cross section differently as a function of x and Q 2 . Therefore, 
requiring the cross sections to agree at all x and Q 2 constrains the systematics efficiently. In addition, 
for some regions of the phase space, one of the two experiments has superior precision compared to the 
other. For these regions, the less precise measurement is fitted to the more precise one, with a simulta- 
neous reduction of the correlated systematic uncertainty. This reduction propagates to the other average 
points, including those which are based solely on the measurement from the less precise experiment. 

In addition to the 110 sources of systematic uncertainty which result from the separate data sets 
there are three sources of procedural uncertainty deriving from the choices made in the combination. 
Firstly all systematic uncertainties were treated as multiplicative, this has been varied by treating all 
sources bar the normalisations as additive, and the difference is used to estimate a procedural systematic 
error. Secondly, the correlated systematics from HI and ZEUS were treated as uncorrelated between the 
experiments, but this may not be completely true due to some similarity of methods. An alternative com- 
bination procedure treats 12 sources of similar systematics as correlated. This results in some differences 
in the result for the photo-producton background and the hadronic energy scale and these differences are 
use to estimate two further procedural systematic errors. 

The data averaging procedure results in a set of measurements for each process: the average cross 
section value at a point i, its relative correlated systematic, relative statistical and relative uncorrelated 
systematic uncertainties, respectively. The number of degrees of freedom, ndf, is calculated as the 
difference between the total number of measurements and the number of combined data points. The 
value of Xmin/ nc V * s a measure °f the consistency of the data sets. 

Tabulated results for the average NC and CC cross sections and the structure function F2 together 
with statistical, uncorrelated systematic and procedural uncertainties are given in ref [5 1. The total in- 
tegrated luminosity of the combined data set corresponds to about 200 pb _1 for e + p and 30 pb _1 for 
e~p. In total 1402 data points are combined to 741 cross-section measurements. The data show good 
consistency, with x 2 /ndf = 636.5/656, and there are no tensions between the input data sets. 

For Q 2 > 100 GeV 2 the precision of the HI and ZEUS measurements is about equal and thus 
the systematic uncertainties are reduced uniformly. For 2.5 < Q 2 < 100 GeV 2 and Q 2 < 1 GeV 2 the 
precision is dominated by the HI [6,7 ] and ZEUS [12] measurements, respectively. Therefore the overall 
reduction of the uncertainties is smaller, and it is essentially obtained from the reduction of the correlated 
systematic uncertainty. The total uncertainty of the combined measurement is typically smaller than 2% 
for 3 < Q 2 < 500 GeV 2 and reaches 1% for 20 < Q 2 < 100 GeV 2 . The uncertainties are larger for 
high inelasticity y > 0.6 due to the photoproduction background. 

In Fig |2] averaged data are compared to the input HI and ZEUS data, illustrating the improvement 
in precision. Because of the reduction in size of the systematic error this improvement is far better than 
would be expected simply from the rough doubling of statistics which combining the two experiments 
represents. In Fig [3] the combined NC e + p data at very low Q 2 are shown. In Fig |4] the NC reduced 
cross section, for Q 2 > 1 GeV 2 , is shown as a function of Q 2 for the HERA combined e + p data and 
for fixed-target data 112011211 across the whole of the measured kinematic plane. The combined NC e^p 
reduced cross sections are compared in the high-Q 2 region in Fig [5] In Figs [6] and [7] the combined data 
set is shown for CC scattering at high Q 2 . The HERAPDF1.0 fit, described in Sec.|4l used these data as 
input. It is superimposed on the data in the kinematic region suitable for the application of perturbative 
QCD. 



HI and ZEUS 




Fig. 2: HERA combined NC e + p reduced cross section as a function of Q 2 for six x-b'ms compared to the separate HI and 
ZEUS data input to the averaging procedure. The individual measurements are displaced horizontally for a better visibility. 
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Fig. 3: HERA combined NC e + p reduced cross section at very low Q 2 . 



HI and ZEUS 




Fig. 4: HERA combined NC e + p reduced cross section and fixed-target data as a function of Q 2 . The HERAPDF1.0 fit is 
superimposed. The bands represent the total uncertainty of the fit. The dashed lines are shown for Q 2 values not included in 
the QCD analysis. 




Fig. 5: HERA combined NC e p reduced cross sections at high Q 2 . The HERAPDF1.0 fit is superimposed. The bands 
represent the total uncertainty of the fit. 



HI and ZEUS 




Fig. 6: HERA combined CC e p reduced cross section. The HERAPDF1.0 fit is superimposed. The bands represent the total 
uncertainty of the fit. 
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Fig. 7: HERA combined CC e p reduced cross section. The HERAPDF1.0 fit is superimposed. The bands represent the total 
uncertainty of the fit. 
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Table 2: Luminosities of the HERA-II data sets. 
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Table 3: Luminosities of the data sets for the low energy running. The first 6 data sets have already been combined. The final 3 
will be combined. 

3.2 HERA-II inclusive data sets 

The published inclusive data combination does not included the data from the HERA-II running period 
2003-2007. These data were collected for both electron and positron beams, polarised both positively 
and negatively, at i/s = 318 GeV The polarised data can be used to measure electroweak parameters 11221 . 
This is beyond the scope of the present review. For investigation of the parton distribution functions these 
data have been combined into unpolarised cross-sections. Details of the luminosities collected are given 
in Table [2] 

A preliminary combination of these data with the HERA-I data has been made l29l using the 
same \ 2 minimisation method, such that a new a set of measurements for each process, NC and CC e+p 
and e~p, results. There are in total 131 sources of correlated systematic uncertainty, characterising the 
separate data sets, and three sources of procedural uncertainty, plus the overall normalisation uncertainty 
of 0.5%, as before. These preliminary combined data are shown in Figs. [U-Q7J] Comparison of these 
figures with Figs. H-H shows how much the addition of the HERA-II data improves precision at high x 
and Q 2 . The HERAPDF1.5 fit El, described in Sec. lU used these data as input. It is superimposed on 
the data in the figures. 

3.3 F L data 

During the final running period the proton beam ran at three different proton beam energies (920, 575, 460 GeV) 
and NC e+p data were collected. These data access high-y and have been used to measure the longitu- 
dinal structure function Fl 113 111321 . The luminosities for the input data for the combination are specfied 
in Table [3] 

The reduced cross-section data from these runs have been combined 11331 and the combination is 
shown in FigQT] Using these data a combined measurement of Fl can be made ll3~3l . Recall that the 
NC e+p reduced cross section is given by, a = F2 — y 2 FL/Y + , for Q 2 <C M§. Since Q 2 = sxy one 



HI and ZEUS 




Fig. 8: HERA I+II preliminarycombined NC e ± p reduced cross sections at high Q 2 . The HERAPDF1.5 fit is superimposed. 
The bands represent the total uncertainty of the fit. 




Fig. 9: HERA I+II preliminary combined CC e p reduced cross section. The HERAPDF1.5 fit is superimposed. The bands 
represent the total uncertainty of the fit. 
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Fig. 10: HERA I+II preliminary combined CC e p reduced cross section. The HERAPDF1.5 fit is superimposed. The bands 
represent the total uncertainty of the fit. 




Fig. 11: HERA combined NC e p reduced cross sections from running at three different proton beam energies, the predictions 
of HERAPDF1.0 are superimposed. 
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Fig. 12: The slope of a vs f(y) — y 2 /Y+ for various x bins at Q 2 — 32GeV 2 . 
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Table 4: Luminosities of the F^ c data sets. 



needs measurements at different s values in order to access different y values for the same x, Q 2 point. 
The structure function Fl is measured as a slope of a linear fit of a versus f(y) = y 2 /Y + , in x, Q 2 bins. 
Fig [121 shows an example of such a fit, for various x values, at Q 2 = 32 GeV 2 . The measured Fl is 
shown, averaged in x as a function of Q 2 , in Fig[[3l with various theoretical predictions superimposed. 
At low-x, NLO QCD in the DGLAP formalism predicts that this structure function is strongly related to 
the gluon PDF, see Eqn. [I3j 

This combination will be updated to include the recently published HI data, which extend to lower 
Q 2 |[34l . The luminosities used for this extension are given in Tableland the Fl measurement from 
these HI data is shown in Fig [14] 

3.4 Ff and F| 5 data sets 

A preliminary combination has been made of data on F^ lf35l from various different methods of tagging 
charm: using the D* , using the vertex detectors to see the displaced decay vertex, using direct /Jo, D + 
production identified using the vertex detectors, and indentifying semi-leptonic charm decays via muons, 
also using the vertex detectors. The details of the data sets used in the combination are given in Table [4] 
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Fig. 13: The HERA combined measurement of Ft avberaged in x at a given value of Q 2 
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Data Set 


Luminosity in pb 1 


Reference 


High Q 2 norm. inc. jets 96/07 HI 


395 


[47] 


Low Q 2 inc. jets 96/00 HI 


43.5 


m 


High Q 2 inc. jets 96/97 ZEUS 


38.6 


na 


High Q 2 inc. jets 98/00 ZEUS 


82 


m 



Table 5: Luminosities of the jet data sets. 



The results of the F£ c combination compared to the separate measurements which went into it are 
shown in Fig[T5] The FSf combination is shown compared to the predictions of HERAPDF1.0 in Fig. [16] 

Data on F| 6 have not yet been combined. A recent comparison of HI P4l and ZEUS [45 ] separate 
results is shown in Fig. [FT] 



3.5 Jet data sets 

Jet data may also be used to constrain the PDFs. So far H 1 and ZEUS jet data have not been combined but 
some separate HI and ZEUS jet data sets have been input to the HERAPDF fits in order to exploit their 
ability to constrain the gluon PDF and to make a determination of the value of a s (Mz) simultaneously 
with the PDF determination ll46l . The jet data which have been used are summarised in Table [5] These 
data are illustrated in Figs [18] QU |20] ED 



4 Extraction of parton densities 

The section discusses how parton momentum densities are extracted from the HERA data3- There are 
several PDF fits to different HERA data sets. The HERAPDF 1.0 NLO set used only the HERA-I com- 

'Open access code for the HERA PDF fits, and many other useful utilities, are available from the HERAFitter website 
http ://herafi tter. hepf orge . org 
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Fig. 17: The HI and ZEUS measurements of Ff 




Fig. 18: ZEUS 96/97 measurements of tne inclusive jet cross section, as a function of Et jet in the Breit frame for various Q 2 
bins. 
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Fig. 19: ZEUS 98/00 measurements of the inclusive jet cross section, as a function of Et jet in the Breit frame for various Q 2 
bins. . 
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Fig. 20: HI HERA-I+II measurements of tne normalised inclusive jet cross section, as a function of pr for various Q 2 bins. 
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Fig. 21: HI HERA-I measurements of tne low-Q 2 inclusive jet cross section, as a function of pr jet for various Q 2 bins. 



bined data [5]. However there are studies, using the same fit formalism, including the preliminary low- 
energy combined data [51 ] and using the preliminary combined F^ data l52l . The HERAPDF1.5 NLO 
set QUI used the HERA-I+II preliminary combined data ll29l and the same fit formalism. Studies were 
also made using the same data but extending the parametrisation HERAPDF1.5f NLO [46]. This ex- 
tended parametrisation was then used for the HERAPDF1.5 NNLO fit [53]. Subsequent fits also used 
the extended parameuisation: HERAPDF1.6 NLO fit [46 ] which included the HERA-I+II inclusive data 
and separate HI and ZEUS jet data; HERAPDF1.7 NLO fit (54| which used all the data described in the 
previous sections (HERA-I+II inclusive data and low energy inclusive data, Fg data and HI and ZEUS 
jet data sets). 

The relationship of the measured cross-sections to the parton distributions, presented in Sec. [T] 
is not so straightforward beyond LO since the evolved parton distributions must be convoluted with 
coefficient functions and all types of parton may contribute to a particular structure function through 
the evolution. However, the simple LO formulae still give a good guide to the major contributions. The 
cross-sections for NC and CC, e + and e~ scattering on protons provide enough information to extract the 
u and d valence PDFs and the U and D PDFs, as well as the gluon PDF from scaling violation. Briefly: 

• HERA NC e + reduced cross-section data at low Q 2 give information on the shape of the sea 
distribution at low x region, whereas the high Q 2 NC e + and e~ cross-sections not only extend 
the coverage to high x (x < 0.65) but also provide information on the valence combination xF% = 
x(B®u v + B®d v ), which is extracted from the difference between the high-Q 2 NC e + and e~ cross 
sections. The x range of the valence measurement is, 0.01 < x < 0.65. 

• HERA CC data gives further information on flavour separation. The cross-section at high-x is 
u-valence dominated and the e + cross-section is d-valence dominated, giving unique information 
on the d quark. Fixed proton target data are n-quark dominated, and so historically information on 
the d-quark has been extracted from deuterium target data or from neutrino scattering. However, 
each of these methods has difficulties. The neutrino data uses heavy isoscalar targets such that un- 
certain nuclear corrections [55 ] are necessary. The deuterium target data also needs some nuclear 
binding corrections [56] and extraction of the d-quark is dependent on the assumption of strong 
isospin invariance. 

• NC data on F2 have also been used to constrain the gluon distribution. Since the gluon does 
not couple to the photon it does not enter the expressions for the structure functions at all in the 
QPM. However, it is constrained by the momentum sum rule, and by the way that gluon to quark- 
antiquark splitting feeds into the sea distributions (from the P qg term in the DGLAP equations). 
The shape of the gluon distribution extracted from a DGLAP QCD fit will be correlated with the 
value of a s , since an increase in a s increases the negative contribution from the P qq term but this 
may be compensated by a positive contribution from the P qg term if the gluon is made harder. 
Hence, a fixed value of a s , as determined from independent data, has often been assumed in PDF 
fits. HERA data are invaluable in constraining the low-xgluon distribution, since at small x QCD 
evolution becomes gluon dominated and the uncertainties referred to above are reduced. This is 
because F2 is essentially given by the singlet sea quark distribution for x < 0.01, and this in turn 
is driven by the gluon through the P qg term in Eqn. [TT] The approximate LO relationship 

3^ dF 2 (x/2,Q 2 ) 
e 2 a s dlnQ 2 

illustrates how the gluon distribution depends on the scaling violation of F 2 at low x. Hence in 
this kinematic region the gluon distribution may be obtained almost directly from the F 2 scaling 
violation data. 

Jet production data from HERA can give more direct information on the gluon since vector-boson 
gluon fusion (BGF) to quark-anti-quark pairs makes a significant contribution to final state jet 
production. Such data has also been input to the PDF fits to constrain the gluon distribution in the 
x range, 0.01 < x < 0.1, and to simultaneously determine a s (Mz), see Sec. l4.1.5"1 



xa(x) ~ ^v^w j M) 



• The longitudinal structure function Fl can also give information on the gluon as can be seen from 
Equation [T3] At low x the dominant contribution comes from the gluon and the integral over the 
gluon distribution approximates to a 5 function such that a measurement of Fl(x, Q 2 ) is almost 
a direct measurement of the gluon distribution yg(y,Q 2 ) at y = 2.5x ll57l . The heavy quark 
structure functions F% c and F| 6 may also yield information on the gluon since heavy quarks are 
generated by the BGF process. However, currently such data are most useful for distinguishing 
between different schemes for heavy quark production and fixing the value of the heavy quark 
mass parameters that enter into these schemes, see Sec. 14.1.21 

Perturbative QCD predicts the Q 2 evolution of the partem distributions, but not the x dependence. 
The parton distributions are extracted by performing a direct numerical integration of the DGLAP equa- 
tions at NLO and NNLO ll58l . For most PDF extractions (the notable exception is the NNPDF analysis) 
a parametrised analytic shape for the parton distributions (valence, sea and gluon) is assumed to be valid 
at some starting value of Q 2 = Qq. This starting value is arbitrary, but should be large enough to ensure 
that a s (Qo) * s small enough for perturbative calculations to be applicable. For the HERAPDF the value 
Qq = 1.9GeV 2 is chosen such that the starting scale is below the charm mass threshold, Qq < m 2 . Then 
the DGLAP equations are used to evolve the parton distributions up to higher Q 2 values, where they are 
convoluted with coefficient functions to make predictions for the structure functions and cross sections. 
These predictions are then fitted to data to determine the PDF parameters, and thus the shapes of the 
parton distributions at the starting scale and, through evoution, at any other value of Q 2 . 

The QCD evolution is performed using the programme QCDNUM ||59l . The HERADF uses the 
MS renormalisaton scheme, with the renormalisation and factorisation scales chosen to be Q 2 . The light 
quark coefficient functions 116011611 are calculated using the programme QCDNUM. The heavy quark 
coefficient functions are calculated in the general-mass variable-flavour-number scheme of [62 1, with 
recent modifications and extension to NNLO [63 ,64]. (This scheme will be called the RT-VFN scheme). 
The heavy quark masses for the central fit were chosen to be m c = 1.4 GeV and = 4.75 GeV 
and the strong coupling constant was fixed to a s (M§) = 0.1176. These choices are varied to evaluate 
model uncertainties. The predictions are then fitted to the combined HERA data sets on differential cross 
sections for NC and CC e + p and e~p scattering. A minimum Q 2 cut, Q 2 min = 3.5 GeV 2 , was imposed to 
remain in the kinematic region where perturbative QCD should be applicable. This choice is also varied 
when evaluating model uncertainties. It is also conventional to apply a minimum cut on W, invariant 
mass of the hadronic system, to avoid sensitivity to target mass and large- x higher-twist contributions. 
However the HERA data have W > 15 GeV and x < 0.65, so that no further cuts are necessary. 

PDFs were parametrised at the input scale by the generic form 

xf{x) = Ax B {l - x) c {\ + e^Jx + Dx + Ex 2 ). (15) 

The parametrised PDFs are the gluon distribution xg, the valence quark distributions xu v , xd v , and the 
u-type and d-type anti-quark distributions xU, xD. Here xU = xu, xD = xd + xs, at the chosen 
starting scale. The normalisation parameters, A g ,A Uv ,Ad v , are constrained by the quark number sum- 
rules and momentum sum-rule. The B parameters Bjj and Bq are set equal, Bp = Bq, such that 
there is a single B parameter for the sea distributions. The strange quark distribution is expressed as 
x-independent fraction, f s , of the ci-type sea, xs = f s xD at Qq. For f s = 0.5 the s and d quark 
densities would be the same, but the value f s = 0.31 is chosen to be consistent with determinations of 
this fraction using neutrino-induced di-muon production H651I66II . This choice is varied when evaluating 
model uncertainties. The further constraint Ay = Aq(1 — f s ), together with the requirement By = Bq, 
ensures that xu — > xd as x — > 0. For the HERAPDF1.0 and 1.5 NLO central fits, the valence B 
parameters, B Uv and Bd v are also set equal, but this assumption is dropped for fits using the extended 
paramterisation. The form of the gluon parametrisation is also extended for these latter fits such that a 
term of the form A' g x B a(l — x) c a is subtracted from the standard parametrisation, where C' g = 25 is 
fixed and A' g and B' g are fitted. This allows for the gluon distribution to become negative at low x,Q 2 , 



Variation 


Standard Value 


Lower Limit 


Upper Limit 




0.31 


0.23 


0.38 


m c [GeV] 


1.4 


1.35 (Ql = 1.8) 


1.65 


m b [GeV] 


4.75 


4.3 


5.0 


Q 2 mm [GeV 2 ] 


3.5 


2.5 


5.0 


Ql [GeV 2 ] 


1.9 


1.5 (f s = 0.29) 


2.5 (m c = 1.6, f s = 0.34) 



Table 6: Standard values of input parameters and the variations considered. 



although it does not do so within the kinematic range of the fitted data. The central fit is found by first 
setting the e, D and E parameters to zero and then varying them, one at a time, the best fit is achieved for 
E Uv 7^ 0. This is then adopted as standard and the other e, D and E parameters are then varied, one at a 
time, However these fits do not represent a significant improvement in fit quality for the HERAPDF1.0, 
1.5 and 1.7 NLO fits, and thus a central fit with just E Uv ^ is chosen. For the HERAPDF1.5f, 1.6 and 
HER APDF 1 . 5NNLO fit an extra parameter, D Uv ^ is used. The HERAPDF1.0 and 1.5 NLO fits have 
10 parameters, and the 1.5f, 1.6 NLO and 1.5NNLO fits have 14 parameters and the HERAPDF1 .7NLO 
fit has 13 parameters. 

The assumptions made in setting the parameters for this central fit are now discussed: 

• In common with most PDF fits it is assumed that q sea = q. 

• The HERAPDF parametrizes U and D separately to allow for the fact that u / d at high x, but the 
restriction xu — > xd as x — > is imposed. 

• The strange sea is suppressed. However determinations of the degree of suppression are not 
very accurate and hence model uncertainty on this fraction is evaluated by allowing the variation, 
0.23 < f s < 0.38. 

• The u-valence and <i-valence shapes are parametrized separately, but the form of the parametriza- 
tion imposes d v /u v = (1 — x) p as x — )■ 1. 

• The heavy quarks are treated using a General-Mass- Variable-Flavour Number-Scheme. There is 
some model uncertainty in the choice of the heavy quark masses. The ranges 1.35 < m c < 
1.65 GeV and 4.3 < < 5.0 GeV are considered as model variations. There are also different 
heavy quark schemes. The ACOT scheme [67] has been used as a cross-check to the Thorne- 
Roberts scheme. 

• All PDF extractions make choices concerning the fitted kinematic region, i.e the minimum values 
of Q 2 , W 2 , x. These choices can have small systematic effects on the PDF shapes extracted. The 
choice of Q 2 min is varied in the range 2.5 < Q 2 min < 5.0. 

• The PDFs extracted for Q 2 3> Qq lose sensitivity to the exact form of the parametrisation at 
Qq. However the choices of Qq and of the form of parametrisation represent a parametrisation 
uncertainty. The HERAPDF uses the technique of saturation of the x 2 , increasing the number of 
parameters systematically until the x 2 /ndf no longer decreases significantly. However, a number 
of variations on the central fit parametrisation, which have similar fit quality, are considered in 
order to give an estimate of parametrization uncertainty. The value of Qq is also varied in the 
range 1.5 < Qq < 2.5 GeV 2 for the same purpose. 

Table [6] summarizes the variations in numerical values considered when evaluating model un- 
certainties on the HERAPDF Note that the variations of Qq and f s are not independent, since QCD 
evolution will ensure that the strangeness fraction increases as Qq increases. The value f s = 0.29 is 
used for Qq = 1.5 GeV 2 and the value f s = 0.34 is used for Qq = 2.5 GeV 2 in order to be consistent 
with the choice f s = 0.31 at Qq = 1.9 GeV 2 . The variations of Qq and m c are also not independent, 
since Qq < m c is required in the fit programme. Thus when m c = 1.35 GeV, the starting scale used is 
Ql = 1.8 GeV 2 . Similarly, when Q 2 q = 2.5 GeV 2 the charm mass used is m c = 1.6 GeV In practice, 
the variations of f s , m c , mi,, mostly affect the model uncertainty of the xs, xc, xb, quark distributions, 



respectively, and have little effect on other parton flavours. The difference between the central fit and 
the fits corresponding to model variations of m c , mt,, fs, Qmm are added in quadrature, separately for 
positive and negative deviations, to represent the model uncertainty of the HERAPDF sets. 

The variation in Qq is regarded as a parametrisation uncertainty, rather than a model uncertainty. 
The variations of Qq mostly increase the PDF uncertainties of the sea and gluon at small x. At the 
starting scale the gluon shape is valence-like, so for the downward variation of the starting scale, Qq = 
1.5 GeV 2 , a gluon parametrisation which explicitly allows for a negative gluon contribution at low x is 
considered for the 1.0 and 1.5 NLO fits- in all other HERAPDF fits it is already a standard part of the 
parametrisation. Similarly a parametrisation variation, B Uv ^ Bd v , which is standard for the 1.5f, 1.6 
and 1.7 NLO and the 1.5NNLO fits, is also allowed for the 1.0 and 1.5 NLO fits. This increases the 
uncertainties on the valence quarks at low x. Finally, variation of the number of terms in the polynomial 
{1+ey/x+Dx+Ex 2 ) is considered for each fitted parton distribution. In practice only a small number of 
these variations have significantly different PDF shapes from the central fit, notably: D Mv ^ (standard 
for 1.5f, 1.6 NLO and 1.5NNLO), Dp ^ and D D / 0. These variations mostly increase the PDF 
uncertainty at high x, but the valence PDFs at low x are also affected because of the constraints of the 
quark number sum rules. The difference between all these parametrisation variations and the central fit 
is stored and an envelope representing the maximal deviation at each x value is constructed to represent 
the parametrisation uncertainty. 

The HERAPDF uses a form of the \ 2 specified in ref @ to perform the fit of the predictions to 
the HERA data. The consistency of the input data justifies the use of the conventional \ 2 tolerance, 
Ax 2 = 1, when determining the 68%C.L. experimental uncertainties on the HERAPDF1.0 fit. Modern 
deep inelastic scattering experiments have very small statistical uncertainties, so that the contribution 
of correlated systematic uncertainties has become dominant for individual data sets and consideration 
of the treatment of such errors is essential. However, the HERA data combination has changed this 
situation. The combination of the HI and ZEUS data sets has resulted in a data set for NC and CC 
e + p and e~p scattering with correlated systematic uncertainties which are smaller or comparable to the 
statistical and uncorrelated uncertainties. Thus the central values and experimental uncertainties on the 
PDFs which are extracted from the combined data are not much dependent on the method of treatment 
of correlated systematic uncertainties in the fitting procedure. For the HERAPDF 1.0(1. 5) NLO central 
fit, the 110(131) systematic uncertainties which result from the ZEUS and HI data sets are combined 
in quadrature, and the three sources of uncertainty which result from the combination procedure are 
treated as correlated by the Offset method 11681 . The resulting experimental uncertainties on the PDFs 
are small. For the HERAPDF1.5f, 1.6, 1.7 NLO fits and the HERAPDF 1.5 NNLO fit it was decided to 
treat the three procedural errors as correlated by the Hessian method l68l . This has a negligible effect 
on the size of the experimental uncertainties and a small effect on the resulting \ 2 value, see Sec. 14.1.41 
The total PDF uncertainty is obtained by adding in quadrature experimental, model and parameterisation 
uncertainties. 

4.1 Results from the HERAPDF fit 

4.1.1 HERAPDF 1.0 

We first discuss results from the published HERAPDF 1.0 fit. This fit has a \ 2 P er degree of freedom of 
574/582. Fig[22]shows summary plots of the HERAPDF 1.0 PDFs at Q 2 = 10 GeV 2 . 

Figs I23TI251 show the HERAPDF 1.0 distributions, xu v ,xd v ,xS,xg, as a function of x at Q 2 = 
10, 10000 GeV 2 , where xS = 2x(U + D) is the sea PDF. Note that for Q 2 > m 2 , xU = xu + xc, and 
for Q 2 > m 2 , xD = xd + xs + xb, so that the heavy quarks are included in the sea distributions. The 
break-up of xS into the flavours xu sea = 2xu, xd sea = 2xd, xs sea = 2xs, xc sea = 2xc, xb sea = 2xb 
is illustrated so that the relative importance of each flavour at different Q 2 may be assessed. Fractional 
uncertainty bands are shown below each PDF. The experimental, model and parametrisation uncertainties 
are shown separately. The model and parametrisation uncertainties are asymmetric. For the sea and 
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Fig. 22: The parton distribution functions from HERAPDFl.O, xu v ,xd v ,xS = 2x(U + D),xg, at Q 2 — 10 GeV 2 . The 
experimental, model and parametrisation uncertainties are shown separately. The gluon and sea distributions are scaled down 
by a factor 20. 



gluon distributions, the variations in parametrisation which have non-zero e, D and E affect the large-x 
region, and the uncertainties arising from the variation of Qq and Q 2 min affect the small-x region. For 
the valence distributions the non-zero e, D and E parametrisation uncertainty is important for all x, and 
is their dominant uncertainty. The total uncertainties at low x decrease with increasing Q 2 due to QCD 
evolution resulting, for instance, in 2% uncertainties for xg at Q 2 = 10000 GeV 2 for x < 0.01. 

The break-up of the PDFs into different flavours is further illustrated in Fig [26] where the quark 
distributions xu, xd, xc, xs are shown at Q 2 = 10 GeV 2 . The u flavour is better constrained than the 
d flavour because of the dominance of this flavour in all interactions except e + p CC scattering. The 
quark distribution xs is derived from xD through the assumption on the value of f s , and the uncertainty 
on xs directly reflects the uncertainty on this fraction. The charm PDF, xc, is strongly related to the 
gluon density such that it is affected by the same variations which affect the gluon PDF (variation of Qq 
and Qlnin) as well as by the variation of m c . The uncertainty on the bottom PDF, xb (not shown), is 
dominated by the variation of m&. 

The shapes of the gluon and the sea distributions can be compared by considering Figs I23TI251 For 
Q 2 >, 10 GeV 2 , the gluon density rises dramatically towards low x and this rise increases with increasing 
Q 2 . This rise is one of the most striking discoveries of HERA. However, at low Q 2 the gluon shape 
flattens at low x. At Q 2 = 1.9 GeV 2 , the gluon shape becomes valence like and the parametrisation 
variation which includes a negative gluon term increases the uncertainty on the gluon at low x. However 
the gluon distribution itself is not negative in the fitted kinematic region. 

The uncertainty in the sea distribution is considerably less that that of the gluon distribution. For 
Q 2 > 5 GeV 2 , the gluon density becomes much larger than the sea density, but for lower Q 2 the sea 
density continues to rise at low x, whereas the gluon density is suppressed. This may be a signal that 
the application of the DGLAP NLO formalism for Q 2 < 5 GeV 2 is questionable. Kinematically low Q 2 
HERA data is also at low x and the DGLAP formalism may be indequate at low x since it is missing 
ln(l/x) resummation terms and possible non-linear effects - see Ref. [69]. Discussion of this topic is 
beyond the scope of the present review. PDF fits within the DGLAP formalism are successful down to 
Q 2 ~ 2 GeV 2 and x ~ 10 -4 and this is the kinematic region considered in the present review. 

4.1.2 Including Heavy Quark data in PDF fits 

The HERA combined charm data have been presented in Sec. 13.41 Fig{T6] shows the comparison of the 
HERA combined measurements of F^ c with the predictions of the HERAPDFl.O fit. These data can of 
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Fig. 23: The partem distribution functions from HERAPDF1.0, xu v ,xd v , xS = 2x(U + D),xg, at Q 2 = 1.9 GeV 2 . The 
break-up of the Sea PDF, xS, into the flavours, xu aea — 2xu, xd 3ea = 2xd, xs sea = 2xs, xc sea = 2xc is illustrated. 
Fractional uncertainty bands are shown below each PDF. The experimental, model and parametrisation uncertainties are shown 
separately. 




Fig. 24: The parton distribution functions from HERAPDF1.0, xu v ,xd v ,xS = 2x(U + D),xg, at Q 2 — 10 GeV 2 . The 
break-up of the Sea PDF, xS, into the flavours, xu se .a = 2xu, xd sea = 2xd, xs ae a = 2xs, xc a <ta = 2xc is illustrated. 
Fractional uncertainty bands are shown below each PDF. The experimental, model and parametrisation uncertainties are shown 
separately. 




Fig. 25: The partem distribution functions from HERAPDF1.0, xu v , xd v , xS = 2x(U + D),xg, at Q 2 = 10, 000 GeV 2 . The 
break-up of the Sea PDF, xS, into the flavours, xu aea = 2xu, xd 3ea = 2xd, xs S ea = 2xs, ic sm = 2xc, xbsea = 2xb is 
illustrated. Fractional uncertainty bands are shown below each PDF. The experimental, model and parametrisation uncertainties 
are shown separately. 




Fig. 26: The parton distribution functions from HERAPDF1.0, xu, xd, xc, xs at Q 2 = 10 GeV 2 . Fractional uncertainty bands 
are shown below each PDF. The experimental, model and parametrisation uncertainties are shown separately. 
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Fig. 27: x 2 scan vs the charm quark mass m c for the HERAPDF1.0 fit to just HERA-I inclusive data (left) and a fit which also 
inlcudes combined HERA F% c data (right). Both of these fits use the RT-VFN heavy quark scheme 



Scheme 


m c (minimum) 


X 2 /ndp F 2 CC 


RT Standard 
RT Optimized 
ACOT-full 
S-ACOT X 
ZM-VFN 


i -°°-0.03 

1 46 +a02 

i - w -0.04 
1 co+0.03 

1 2fi +0 - 02 
1 - ZD 0.04 


42.0/41 
46.5/41 
59.9/41 
68.5/41 
88.1/41 



Table 7: Charm mass parameters and \ 2 values per number of data points (ndp) for fits to F% c data using various heavy quark 
schemes. 



course be included in the fit. There are 41 jet data points and, for the preliminary combination, these 
are provided with uncorrected systematic errors and a single combined source of correlated error which 
was treated by the Offset method. The x 2 f° r the inclusive data is hardly changed by the addition of 
the charm data but for the x 2 f° r the charm data is very sensitive to the charm mass and the scheme 
used for heavy flavour treatment ll70l . It is found that in order to obtain a good fit using the standard 
Thorne-Roberts variable Flavour Number Scheme (RT-VFN) ifTUl it is necessary to increase the standard 
value of the charm mass. Fig[27]shows a scan of the x 2 of the HERAPDF1.0 fit to the inclusive HERA-I 
data vs the charm quark mass parameter entering into the standard RT-VFN scheme. In the same figure a 
scan for a similar fit to the inclusive HERA-I data plus the combined F% c data is shown. The sensitivity 
of the charm data to the charm quark mass parameter is clear. 

However the Standard RT-VFN scheme is not the only possible heavy quark scheme. The fit to 
HERA-I inclusive plus F| c data has been repeated for the Optimized RT-VFN scheme 11711 . the full 
ACOT scheme, the S-ACOT- x scheme f72l and the Zero-Mass Variable Flavour Number Scheme (ZM- 
VFN) in which light-quark coefficient functions are used for the heavy quarks, which are simply turned 
on at threshold Q 2 ~ m? c . Fig. [28] shows the x 2 scan f° r these different heavy quark schemes. It can be 
seen that all schemes, bar the ZM-VFN, give acceptable fits, and that each scheme has its own preferred 
value of the charm quark mass. These values and the correspondin x 2 values are given in Table [7] Fig [29] 
shows these fits compared to the charm data. 

Predictions for W + , W~ , Z production at the LHC are sensitive to the value of the charm mass 
and to the heavy quark scheme used, as illustrated in Fig. [28] For any chosen value of the charm mass 
the spread of predictions for different schemes is ~ 7%. However if each prediction is used at its own 
favoured value of the charm mass then this spread is reduced to ~ 2% and, if the disfavoured ZM-VFN 
is excluded to <, 1%. 
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Fig. 28: Left: x 2 scan vs me charm quark mass m c for a fit to HERA-I inclusive and _F| C data for various heavy quark 
schemes. Right: Predictions for the W cross-section at the LHC (7TeV) for these schemes vs m c . The value of the charm 
mass parameter which gives the minimum \ 2 is marked by a star for each scheme. 
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Fig. 29: F| c data compared to the predictions of various heavy quark schemes, within the HERAPDF fit formalism. 
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Fig. 30: The combined HERA data from running with proton beam energies E p — 460 GeV and E v = 575 GeV is shown for a 
few low-Q 2 bins, compared to predictions from PDF fits to these data and the combined HERA-I high energy data. Predictions 
are shown for data subject to two different minimum Q 2 cuts: 3.5 GeV 2 and 5.0 GeV 2 . 

4.1.3 Including Low Energy run data in PDF fits 

The preliminary combined data from the low energy running described in Sec l3.3l have been input to the 
HERAPDF fit together with the HERA-I combined high energy data. Thess data have 25 sources of cor- 
related systematic uncertainty from the individula experiments, and 3 procedural sources of systematic 
uncertainty similarly to the high energy combination. These correlated errors are added in quadrature 
except for the 3 procedural which are treated as fully correlated by the Hessian method. There are 
224 combined data points on the NC e + p cross section from the low energy proton beam running and 
when they are fit together with the 592 combined data points from the HERA-I running the x 2 /ndf is 
845.7/806 for 10 parameters. The partial x 2 /ndp are 588/592 for the high energy inclusive data and 
257.6/224 for the low energy inclusive data. These data are sensitive to the minimum Q 2 cut imposed, as 
illustrated in Fig. [30] A better fit is obtained with a larger, Q 2 > 5 GeV 2 , cut. The partial x 2 /ndp after 
this cut are 527.1/566 and 200/215 for the low energy data. The data at low x, Q 2 access high y and thus 
sensitive to the longitudinal structure function Fl. Because of the close relationship of Fl and the gluon 
PDF these data should affect the gluon PDF. This is illustrated in Fig. [31] which shows that variation of 
the Q 2 -cut affects the gluon PDF more for the fit including low energy data, since the result is outside the 
error bands which include this cut- variation for the HERAPDF 1.0 fit. Kinematically cutting out low Q 2 
data also implies cutting out data at the lowest x and the data are similarly sensitive to an x > 0.0005 cut. 
Data at low x may not be well fit by the DGLAP formalism since this is missing ln(l/x) resummation 
terms and possible non-linear effects. Fig [3J] also illustrates sensitivity to a 'saturation' inspired cut of 
Q 2 > l.Ox -0 3 GeV 2 . However, one cannot claim that any break-down of the DGLAP formalism has 
yet been observed, since if the HERAPDF 1.0 formalism is generalised to the extended parametrisation 
with 14 parameters, then the increased uncertainty in the low-x gluon, illustrated in Fig.[34j covers the 
sensitivity of the low energy data to the low x, Q 2 cuts. 

4.1.4 HERAPDF 1.5 

The HERAPDF1.5 NLO fit uses the same formalism as HERAPDF1.0 but includes preliminary HERA- 
I+II data. The x 2 per degree of freedom for the HERAPDF 1.5 NLO central fit is 760/664, where the 
increased \ 2 reflects the greater accuracy of the HERA-I+II combination. This fit has already been com- 
pared to the data in Figs. [8}- [10] The improvement to the PDFs is illustrated in Fig. [32j which shows the 
HERAPDF 1.5 in a format such that it may be directly compared with HERAPDF 1.0 in Fig. [22l Fig. [33] 
shows the HERAPDF1.5 overlayed on HERAPDF 1.0 on a linear x scale, such that the improvement at 
high x may be clearly seen. 
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Fig. 31: The parton distribution functions from HERAPDF1.0, xu v , xd v , xS = 2x(U + D), xg, at Q 2 = 10 GeV 2 . The total 
uncertainties are shown. The gluon and sea distributions are scaled down by a factor 20. (Left) The lines overlayed show the 
results of fits to the HERA-1 data plus the low energy running data with the standard minimum Q 2 cut of 3.5 GeV 2 and with a 
harder cut of 5.0 GeV 2 . (Right) The lines overlayed show the results of fits to the HERA-1 data and to the HERA-1 plus the 
low energy running data, with the 'saturation inspired' cut of Q 2 > 1.0:E _0 ' 3 GeV 2 . 




Fig. 32: The parton distribution functions from HERAPDF1.5, xu v , xd v ,xS = 2x(U + D),xg, at Q 2 = 10 GeV 2 . The 
experimental, model and parametrisation uncertainties are shown separately. The gluon and sea distributions are scaled down 
by a factor 20. 
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Fig. 33: The parton distribution functions from HERAPDF1.5, xu v ,xd v , xS = 2x(U + D), xg, at Q 2 — 10 GeV 2 , overlayed 
on the parton distributions from HERAPDF1.0. The total uncertainties for each PDF are shown. A linear scale in x is used to 
emphasize the reduction in uncertainties for HERAPDF1.5 at high x. 

The tit formalism was also extended to included more PDF parameters, as already described in 
Sec. H This tit, called HERAPDF1.5f NLO fit, has x 2 per degree of freedom 730/664, where the 
improvement to the x 2 is mostly due to the treatment of the three procedural systematic errors by the 
Hessian rather than Offset method. There is a small decrease in x 2 > Ax 2 = —5 due to the increase 
of the number of parameters from 10 to 14. The PDFs of the HERAPDF1.5f and 1.5 NLO fits are 
compared in Fig. |34j where one can see that the extra freedom in the parametrisation does not change 
the central values of the PDFs significantly. The total size of the PDF uncertainties are also not changed 
significantly, although some of the parametrisation uncertainty in HERAPDF1.5 is now included in the 
experimental uncertainty in HERAPDF1.5f. The most significant change to the uncertainties is a modest 
increase in the uncertainty of the low-x gluon. This covers the sensitivity to low-x, Q 2 cuts found in the 
low energy data combination, see Sec. 14.1.31 

The HERAPDF1.5 NNLO fit was performed on the same preliminary combined HERA I+II data. 
The x 2 per degree of freedom for for the HERAPDF1.5 NNLO central fit it is 740/664. For this NNLO 
fit the addition of extra parameters made a significant difference to the x 2 > The change from a 10 to 14 
parameter fit, results in a change of Ax 2 = — 32 with the largest difference coming from the addition 
of the term which allows freedom in the low x gluon. Fig.|35]compares the HERAPDF1.5 NNLO fit to 
HERAPDF1.0 NNLO which was an NNLO version of the HERAPDF1.0 using just 10 parameters and 
fitting just HERA-I data. One can see that the extra parameters give somewhat different shapes to the 
valence quarks and a much harder high-x gluon PDF. 

Fig.[36]compares the HER APDF 1 . 5NNLO fit to the corresponding NLO fit HERAPDF1.5f. These 
fits have the same number of parameters. The change from NLO to NNLO gives a somehat steeper sea 
and softer gluon at low x consistent with the different rates of evolution at NNLO. The most striking 
difference is the greater level of uncertainty at low x for the NNLO fit. This is mostly due to sensitivity 
to the low Q 2 cut on the data. One might have expected that an NNLO fit would fit low x, Q 2 data better 
than an NLO fit, however this would seem not to be the case, see also ref. f73~l . 

4.1.5 Including jet data in PDF fits: HERAPDF1.6 

The gluon PDF contributes only indirectly to the inclusive DIS cross sections. However, the QCD pro- 
cesses that give rise to scaling violations in the inclusive cross sections, namely the QCD-Compton 
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Fig. 34: The partem distribution functions from HERAPDF1.5 and HERAPDF1.5f, xu v ,xd v ,xS = 2x(U + D),xg, at 
Q 2 — 10 GeV 2 . Fractional uncertainty bands are shown below each PDF. The experimental, model and parametrisation 
uncertainties are shown separately. 




Fig. 35: The partem distribution functions from HERAPDF1.5 NNLO, xu v ,xd v ,xS = 2x(U + D), xg, at Q 2 = 10 GeV 2 , 
compared to HERAPDF1.0 NNLO. A linear scale in x is used to emphasize the differences at high x. 
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Fig. 36: The partem distribution functions from HERAPDF1.5NNLO, xu v , xd v , xS = 2x(U + D),xg, at Q 2 = 10 GeV 2 , 
compared to the HERAPDF1.5f NLO fit. The gluon and sea distributions are scaled down by a factor 20. 



(QCDC) and boson-gluon-fusion (BGF) processes, can also be observed as events with distinct jets in 
the final state provided that the energy and momentum transfer are large enough. The cross section for 
QCDC scattering depends on a s (Mz) and the quark PDFs. The cross section for the BGF process de- 
pends on a s (Mz) and the gluon PDF These two processes are dominant in different kinematic regions. 
Thus jet cross sections give new information about the PDFs. For the inclusive data, the correlation 
between a s (Mz) and the gluon PDF limits the accuracy with which either can be determined. The jet 
data bring new information which helps to reduce the overall correlation. 

In the HERAPDF1.6 NLO PDF fit the jet data sets presented in Sec.[33]are fitted together with the 
preliminary HERA I+II combined inclusive data. These data sets have 12 correlated systematic errors, 
which are treated as fully correlated by the Hessian method. The predictions for the jet cross sections 
have been calculated to NLO in QCD using the NLOjet++ program f74l and have been input to the fit 
by the FASTNLO interface ll75Tl . The calculation of the NLO jet cross sections is too slow to be used 
iteratively in a fit. Thus NLOjet++ is used to compute LO and NLO weights which are independent of 
a s and the PDFs. The FASTNLO program then calculates the NLO QCD cross sections, by convoluting 
these weights with the PDFs and a s . The predictions must be multiplied by hadronisation corrections 
before they can be used to fit the data. These were determined by using Monte Carlo (MC) programmes, 
which model parton hadronisation to estimate the ratio of the hadron- to parton-level cross sections for 
each bin. The hadronisation corrections are generally within a few percent of unity. The predictions for 
jet production were also corrected for Z° contributions. 

The fit is done with the same settings as for the HERAPDfl.5f fit. The x 2 / n df for the fit is 
812/766, for a fit to 674 inclusive data points and 106 jet data points with 14 parameters. The partial \ 2 
of the data sets is 730/674 for the inclusive data and 82/106 for the jet data. Fig. [37] shows the parton 
distributions and their uncertainties for the HERAPDF1.6 fit. HERAPDfl.5f is also shown on this plot 
as a blue line. The fit with jets has rather similar central PDFs values to the fit without jets, apart from 
having a somewhat less hard high-x sea. The uncertainties are also similar to those of HERAPDfl.5f, 
with a slightly reduced uncertainty on the high-x gluon. The quality of the fit to the jet data establishes 
that NLO QCD is able simultaneously to describe both inclusive cross sections and jet cross sections, 
thereby providing a compelling demonstration of QCD factorisation. 

The standard value of a s (Mz) used in the fits has been a s (Mz) = 0.1176. The correlation be- 
tween a s (Mz) and the gluon PDF is too strong to make an accurate detrmination of a s (M z ) using purely 
inclusive data, but the jet data are sensitive to a s (Mz) such that one may let it be a free parameter of the 
fit. The value of a s (M z ) which results is a s (M z ) = 0.1202±0.0013(exp)±0.0007(modeZ/param)± 
0.0012(/iad) + 0.0045/ — 0.0036(sca/e). We estimate the model and parametrisation uncertainties for 
as{Mz) in the same way as for the PDFs and we also add the uncertainties in the hadronisation cor- 
rections applied to the jets. The scale uncertainties are estimated by varying the renormalisation and 
factorisation scales chosen in the jet publications by a factor of two up and down. The dominant contri- 
bution to the uncertainty comes from the jet renormalisation scale variation. Fig. 14.1 .51 shows a \ 2 scan 
vs as{Mz) for the fits with and without jets, illustrating how much better as(Mz) is determined when 
jet data are included. The model and parametrisation errors are also much better controlled. 

The x 2 for the HERAPDF1.6 fit with free a s (M z ) is 807.6 for 765 degrees of freedom. The 
partial-x 2 for the inclusive data has barely changed but the partial- x 2 for the jet data decreases to 77.6 
for 106 data points. Fig.|39lshows the summary plots of the PDFs for HERAPDF1.5f and HERAPDF1.6, 
each with as(Mz) left free in the fit. It can be seen that without jet data the uncertainty on the gluon 
PDF at low x is large due to the strong correlation between the low-x shape of the gluon PDF and 
ots(Mz). However once jet data are included the extra information on gluon induced processes reduces 
this correlation and the resulting uncertainty on the gluon PDF is not much larger than it is for fits with 
as(Mz) fixed. 

Direct photoproduction dijet cross sections have also been used in PDF fits to constrain the gluon 
as for example in the ZEUS -jets analysis of ZEUS inclusive cross-section data and jet data [76] However, 
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Fig. 37: The partem distribution functions from HERAPDF1.6, xu v ,xd v , xS — 2x(U + D),xg, at Q 2 = 10 GeV 2 . Fractional 
uncertainty bands are shown below each PDF. The experimental, model and parametrisation uncertainties are shown separately, 
the central fit of HERAPDF1.5f is shown as a blue line 

such data have net yet been used in the HERAPDF fits because the cross-section predictions for photo- 
produced jets are sensitive to the choice of the input photon PDFs. In order to minimise sensitivity to 
this choice, the analysis can be restricted to use only the 'direct' photoproduction cross sections. These 
are defined by the cut x° bs > 0.75, where x° bs is a measure of the fraction of the photon's momentum 
that enters into the hard scatter. This is a direction for further study 

4.1.6 Bringing it all together: HERAPDF 1. 7 

Finally an NLO fit has been made bringing together all the data sets: HERA I+II combined high energy 
data, combined low energy running data, F| c data and jet data. This fit is called HERAPDF1.7 f54ll . 
The charm data are fit in the optimized version of the RT heavy quark scheme, with its preferred value 
of m c = 1.5. The value of a s {Mz) = 0.119 is fixed. This value gives the best fit to all the data in 
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Fig. 38: The difference between % 2 and its minimum value for the HERAPDF1.5f and HERAPDfl.6 fits as a function of 
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Fig. 39: The parton distribution functions xu v ,xd v ,x,S = 2x(U + D), xg, at Q 2 — 10 GeV 2 , from HERAPDF1.5f and HER- 
APDf 1.6, both with as(Mz) treated as a free parameter of the fit. The experimental, model and parametrisation uncertainties 
are shown separately. The gluon and sea distributions are scaled down by a factor 20. 



this fit, with the jet data dominating the sensitivity. Other settings are as for the HERAPDFi.6 fit except 
that the parameter D uv was found to be consistent with zero and hence oniy 13 parameters have been 
used for the HERAPDF1.7 fit. The correlated systematic uncertainties of the data sets are treated as for 
the individual fits: for the inclusive combined data sets at both low and high energy only the procedural 
errors are treated as correlated by the Hessian method. For the F$ c data one source is treated as correlated 
by the offset method and for the jet data all 12 sources are treated as correlated by the Hessian method. 

The overall x 2 /ndf is 1097.6/1032 with partial x 2 /ndp of: 44.1/41 for Ff data; 226.6/224 for 
low energy data; 80.6/106 for jet data and 746/674 for HERA-I+II high energy data. The data are all 
very compatible. The results of this combined fit are illustrated in Fig. [40] 



4.2 Comparison of HERAPDF to other PDFs 

Fig.gUcompares HERAPDF 1.5 to MSTW08 (H, CTEQ6.6 G2, CT10 ESI, NNPDF2.1 [801, ABKM09 ED, 
JR09 (82l at Q 2 = 10 GeV 2 . All PDFs are shown with 68% CL uncertainties. The top row compares 
NLO PDFs and the bottom row compares NNLO PDFs. These PDF sets have been chosen for com- 
parison because they have been selected for benchmarking by the PDF4LHC group [83] (though CT10 
and NNPDF2.1 are updates of the benchmarked PDFs). All PDFs are broadly compatible but there are 
differences of detail which can have important consequences for predictions of LHC cross sections. 

A concise way to compare predictions for various LHC cross sections is to compare parton-parton 
luminosities for quark-antiquark and gluon-gluon interactions. 

OLgg = 1 f 1 dXl 

ds s J T X\ 
dL^gg) _ 1 r 1 dXl 
ds s J T x\ 

where s is the centre of mass energy squared of the proton-proton collision and x\ and X2 are the frac- 
tional momenta of the partons in each proton, such that the centre of mass energy squared of the parton- 
parton collision is, s = rs, where r = xxx 2 . 

Fig. 02] shows q — q and g — g luminosities for p — p interactions at the LHC H with y/s = 7 TeV, in 
ratio to those of the MSTW2008 PDF, for PDFs issued by CTEQ, NNPDF and HERAPDF. This figure 



-fg(xi,s)f g (x 2 ,s) (16) 



^q=d,u,s,c,b [fq(xi,s)f<j(x 2 , s) + fg(x 1 ,s)f q (x 2 , S)] (17) 



2 Plots on top and middle rows from G.Watt http://projects.hepforge.org/mstwpdf/pdf41hc 
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Fig. 40: The parton distribution functions from HERAPDF1.7, xu v , xd v ,xS = 2x(U + D),xg, at Q 2 = 10 GeV 2 . The 
experimental, model and parametrisation uncertainties are shown separately. The sea and gluon distributions are scaled down 
by a factor 20. 
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Fig. 41: HERAPDF1.5 compared to other PDFs. Top: NLO PDFs from MSTW08, CTEQ66, NNPDF2.1, CT10. Bottom: 
NNLO PDfs from MSTW08, NNPDF2.1, ABKM09, JR09 




Fig. 42: Left hand side: the q — q luminosity in ratio to that of MSTW2008 for various PDFs. Right hand side: the same for the 
g — g luminosities. Upper row, PDFs as benchmarked by the PDF4LHC group: middle row, updates for CT10 and NNPDF2.1 
issued in 201 1: bottom row, the HERAPDF NLO updates described in this review. 

also shows the corresponding luminosity plots for the HERAPDF1.5, 1.6, 1.7 NLO updates described in 
this review. Fig. @3]shows similar luminosity comparison plots for NNLO PDFs from MSTW, ABKM, 
JR and HERAPDF. 

There are several reasons why the PDF predictions differ. It is beyond the scope of the current re- 
view to describe all the other PDFs in detail. However a few remarks can be made on the main differences 
between HERAPDF and other PDFs. Firstly they are based on different data sets and different choices 
of cuts on these data sets and this is closely related to the differing way in which the PDF uncertainties 
are estimated since the use of many different data sets has led to the use of increased \ 2 tolerances for 
some of the PDF sets. Secondly, different choices of PDF parametrisation are made and this impacts on 
the size of the uncertainties. Thirdly, PDFs use different central values of a s (Mz) and this affects the 
shape of the PDFs, particularly the gluon PDF. Fourthly, the PDF analyses differ in the schemes used to 
account for heavy quark production and different heavy quark masses. 

4.2.1 Correlated systematic uncertainties and \ 2 tolerance. 

Most modern data used in PDF fits are statistically very precise such that systematic errors dominate. 
Thus the correct treatment of correlated systematic errors becomes very important. In PDF fits done 
prior to the year 2000 point-to-point correlated systematic errors were not specifically treated. They were 
added in quadrature to the uncorrected errors. This can lead to biassed results. The correct treatment 
of correlated systematic errors is discussed in Ref 11681 . The consensus amongst PDF fitters is that the 
uncertainty due to correlated systematic errors should be included in the theoretical prediction such that 

^(p >S )=^ NLOQCD (p)+E^r 

A 

where p are the PDF parameters, s\ represent independent (nuisance) variables for each source of sys- 
tematic uncertainty and represents the one standard deviation correlated systematic error on data 
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Fig. 43: Left hand side: the q — q luminosity in ratio to that of MSTW2008 for various PDFs. Right hand side: the 
same for the g — g luminosities. Upper row, NNLO PDFs as benchmarked by the PDF4LHC group {Plots from G. Watt 
http://projects.hepforge.org/mstwpdf/pdf4lhc/): bottom row, the HERAPDF1.5 NNLO PDFs described in this review compared 
to HERAPDFLO NNLO 

point i due to correlated error source A. A representative form of the x 2 is then given by 

2 _ \- [Fj(p, s) - ^(meas)] 2 x - 2 nQ , 
X - 2^ T2 ^Z^A U 8 -> 

i i A 

where Oi is the uncorrected error on each data point. Thus the nuisance parameters are fitted together 
with the PDF parameters. This method of treatment of correlated systematic has been termed the Hes- 
sian method. An alternative is the Offset method in which s\ = for the central fit but the nuisance 
parameters are varied when determining the error on the PDF p arameters (68]. 

In the PDF fits of CTEQ/CT, MSTW and GJR/JR the Hessian method is used with increased x 2 
tolerances such that a 68%(90%)CL is not set by a variation of Ax 2 = 1(2.73) but by a larger variation. 
The reason for the use of such increased x 2 tolerances arises when using many different input data sets 
which are not all completely consistent. The tolerances are set so as to ensure that all the separate data 
sets are fit to within their 68%(90%)CL. The tolerance can differ according to the parameter being fitted 
(or more exactly according to an eigenvector combination of parameters, see Sec. \4.2.2\ but as a rough 
guide the CTEQ 90%CL tolerance is A^ 2 ~ 100 and the MSTW 90%CL tolerance is A^ 2 ~ 30 for the 
MSTW2008 analysis^. The GJR/JR analyses use A% 2 ~ 50. The use of these increased x 2 tolerances 
has caused great controversy. For example, Pumplin ll84l argues that a x 2 tolerance of at most ~ 10 can 
be justified on the grounds of data incompatibility and that the more inflated values implicitly account 
for parametrisation variations. 

The ABKM group does not use increased tolerances and that is why their PDF uncertainties are 
generally smaller than those of other groups. HERAPDF also does not use an increased tolerance but 
considers additional model and parametrisation uncertainties, see Sec. |4] The NNPDF group use a com- 
pletely different method of estimating PDF uncertainties, see Sec 14.2.41 

For the HERAPDF the Hessian procedure has already been applied to the data combination to set 
the best values for the systematic shifts and the combination procedure itself results in greatly reduced 
systematic errors, such that there is no longer a significant difference in the PDF uncertainties obtained 
using the Offset and Hessian methods of treating systematic uncertainties. The good x 2 f° r the com- 
bination fit also establishes that the resulting data set is very consistent, see Sec. 13.11 such that in the 
HERAPDF the conventional tolerance A^ 2 = 1 is appropriate for setting 68%CL uncertainties. 

3 Note that MSTW provide both 68% CL and 90%CL uncertainties, whereas for CTEQ/CT a factor of 1./V2.73 must be 
applied to the 90%CL uncertainties to obtain 68%CL uncertainties. 



4.2.2 Diagonalisation and Eigenvector PDF sets 

In either the Hessian or the Offset method, the Hessian matrices and covariance matrices are not, in 
general, diagonal. The variation of x 2 w.r.t. some parameters is much more rapid than that of others, 
but because the parameters are correlated to each other the effect of each parameter is not clear. When 
evaluating uncertainties on physical observables it can be an advantage to use an eigenvector basis of 
PDFs, which provide an optimized representation of parameter space in the neighbourhood of the min- 
imum. The eigenvalues of the covariance matrix represent the squares of the errors on the combination 
of parameters which gives the corresponding eigenvector. 

An eigenvector basis of PDFs is the usual way of summarizing the results of a PDF analysis includ- 
ing its error estimates. Two sets of PDF parameters must be supplied for each eigenvalue, representing 
displacement up and down along its eigenvector direction by the \ 2 tolerance. The symmetric error on a 
quantity F which is a function of the PDF parameters is then simply given by 

F(p+)-F(pj) l 2 
2 

where F{p^), F(pj) are the values of F evaluated up and down along eigenvector j. Asymmetric errors 
may be evaluated by the prescription: 

< >= E [max{F(p+) - F(p%F(pj) - F(p° s ),Q)] 2 

j 

< >= E [max(F(p]) - F(pj),F(p° J ) - F(pj),0)] * 

j 

where F(j>®), is the central value of F. 

The PDFs from the HERAPDF are made public in this format via the LHAPDF ( http://lhapdf.hepforge. 
interface. As well as the eigenvector sets, which give the experimental uncertainty of the HERAPDF, 
further sets are provided to cover the model and parametrisation variations. These should be combined 
with the experimental errors as specified in Sec. 14.11 Further PDF sets are also provided for a range of 
fixed a s (Mz) values, so that uncertainty due to a s (Mz) variation may also be evaluated. 

The LHAPDF library is also the repository for the PDF sets from other PDF fitting groups. 

4.2.3 Choice of data sets and kinematic cuts 

The CTEQ6.6, MSTW2008, ABKM09 and JR09 PDF analyses do not use the recently combined inclu- 
sive cross section data from HERA-I [5] which are up to three times more accurate than the separate 
HI and ZEUS data sets used by previous PDF analsyses. These combined HERA data are shifted in 
normalisation by ~ 3% with respect to the previous HERA data, and this explains the higher luminosity 
of the HERAPDF at low r. 

Conversely the HERAPDF analysis uses only HERA data, whereas the CTEQ, MSTW and NNPDF 
analyses are 'global' PDFs which also use: older fixed target data, both from DIS and from Drell-Yan 
production; Tevatron W, Z cross section data and jet production data. The ABKM and JR PDFs each use 
some but not all of these non-HERA data sets. 

The use of a single consistent data set with a clear statistical interpretation of uncertainty limits 
was one of the primary motivations behind the HERAPDF. However there are other reasons why the 
use of some of the other data sets may lead to further uncertainties. Firstly, the neutrino-Fe fixed-target 
scattering data from CCFR and NuTeV, which is often used to help to determine the valence densities, 
needs corrections for nuclear effects (the 'EMC effect'). Although such nuclear corrections are made in 
the global PDF analyses, they are not perfectly determined and the uncertainty due to these corrections is 



< 4 >= E 



not fully accounted [55]. More recently similar critisms have been made of the use deuterium target data 
(either in DIS or Drell-Yan). Accardi et al ll56l have reconsidered deuterium corrections for the fixed 
target data. They find large uncertainties in these corrections and this results in greater (unaccounted for) 
uncertainty in the high-x d— quark for fits where the deuterium data is the principal source of information 
on the d— quark. (For the HERAPDF the information on d— quark comes from CC e + p scattering). 

Fixed proton target data do not suffer from these problems, but the kinematic reach of such data 
does extend into the high-x, low-Q 2 region, where the theoretical interpretation of the data requires con- 
sideration of target mass corrections and higher twist terms. Most PDF analyses make a W 2 >, 15GeV 2 
cut to avoid this region (for the HERAPDF this is unnecessary since all HERA data is at large W). The 
ABKM analysis choses to include the low W data and model the higher twist terms. The high-x region 
is also receiving attention from the CTEQ-JLAb group [f8Sl . 

A further problem in the use of older fixed target data is that results were often presented and used 
in terms of F2 rather than reduced cross sections. ABM [86 1 have examined the use of NMC F2 data 
in the global fits. The NMC extraction of F2 relied on assumptions on the value of Fl which are not 
consistent with modern QCD calculations. ABM find that using NMC published values of F2, rather 
than the NMC cross section data, raises their extracted values of as erroneously. 

The HERAPDF avoids bias from erroneous assumptions about heavy target corrections, deuterium 
corrections, higher twist corrections and Fl corrections, by using only HERA pure proton target cross 
section data, but a price is paid in terms of the uncertainties of the high-x parton distributions, which are 
generally larger than those of the other groups. 

It is also notable that the HERAPDFs have a harder high-x sea and a softer high-x gluon PDF at 
NLO. It has been suggested that this may be because Tevatron jet data are not included in the HERAPDF 
fit. However the story is not quite so simple. 

Global fits use Tevatron high-E^ jet production data to help to pin down the high-x gluon. The 
HERAPDF analysis uses HERA-jet data for the same purpose, although the HERA jet data do not extend 
to as high x values as the Tevatron jet data. These Tevatron jet data have very large correlated systematic 
uncertainties compared to HERA jet data such that much trust must be put in the evaluation of systematic 
uncertainties. Tevatron Run-I jet data suggested a hard high-x gluon, but Run-II data soften this. The 
MSTW analysis uses only Run-II data whereas the CT/CTEQ analyses use both Run-I and Run-II data. 
These choices can explain the harder gluon luminosities of the CT PDFs at high-x. Watt and Thorne ll87l 
obtain poor \ 2 when comparing the Tevatron jet data to the HERAPDF1.0, 1.5 predictions. However 
their fits only compare to the central predictions of the HERAPDF. A more valid comparison would 
account for the HERAPDF error bands. If the Tevatron jet data are input to the HERAPDF1.5 fit a much 
better x 2 (x 2 / n dp = 1-48 for CDF and 1.35 for DO jets) is obtained. Significantly, the resulting PDFs do 
not lie outside the HERADF1.5 error bands (although they do imply a harder high-x gluon- on the upper 
edge of the error band). The reason that the HERAPDF can give a reasonable description of Tevatron jet 
data, while still having a relatively soft high-x gluon PDF, is that high-£/r jets result not only from the 
high-x gluon but also from high-x quarks and HERAPDF has a rather hard high-x quark PDF. 

The ABKM analysis also choses not to use Tevatron jet data, partly because new physics effects 
may be hidden in the data, biassing the PDFs. Consequently, ABKM has a soft high-x gluon luminosity. 
Nevertheless, ABM gives a good description of Tevatron jet data ll88l . A further issue regarding the use 
of Tevatron jet data concerns their use together with deuterium fixed-target data. The greater uncertainty 
in the high-x d— quark, due to uncertain deuterium corrections, will feed into the high-x gluon PDF, 
since the d — g process provides a substantial part of the Tevatron jet cross section. However this larger 
uncertainty is usually not accounted for [56 1. 



4.2.4 Parametrisation and model uncertainty 

HERAPDF central fits have a relatively small number of parameters ~ 14. However, parametrisation 
uncertainty is estimated by making fits with additional parameters freed, or with a change of the choice 
of the starting scale, Qq, which is equivalent to a re-parametrisation. The comparison of HERAPDF1.5, 
which uses 10 free parameters and HERAPDF1.5f which uses 14 free parameters in Fig. [34] shows 
that this procedure for accounting for parametrisation uncertainty largely accounts for the uncertainty 
introduced when the the extra parameters are freed in the central fit. 

The HERAPDF results in a similar central value and uncertainty estimates to those of the global 
PDFs in many kinematic regions. In the case of the central values this is because the HERA data dom- 
inate the global input data. In the case of the uncertainty estimates it is partly due to the fact that the 
HERAPDF experimental uncertainties are augmented by estimates of the model and parametrisation 
variations, which are not accounted in the CT and MSTW analyses. This lends support to the idea that 
the increased x 2 tolerances of MSTW and CT partly cover some of these additional model and parametri- 
sation uncertainties. 

The NNPDF global analysis uses a completely different approach both to PDF parametrisation 
and to the determination of PDF uncertainties. All errors (statistical, systematic and normalisation) as 
given by experimental collaborations are represented by Monte Carlo replica sets of artificial data. A 
neural net is used to learn the shape of these replicas rather than using a fixed parametrisation at the 
starting point. This can be regarded as equivalent to using a very large number of parameters. The PDFs 
are not determined by a \ 2 fit but by stopping the learning algorithm before overlearning occurs. The 
results are not presented in terms of eigenvectors of the fit but in terms of a set of replicas such that their 
mean gives the best estimate of the central PDF and the standard deviation from this mean gives the 
68%CL uncertainty estimate. It is remarkable that this entirely different procedure gives broadly similar 
central values and uncertainty estimates as those of the MStW and CTEQ global fits. To some extent this 
vindicates the standard procedure, in particular with regard to the use of increased \ 2 tolerances to set 
the 68%CL uncertainties. 

4. 2. 5 The value of a s (Mz ) 

Some groups (HERAPDF, CTEQ, NNPDF) adopt a fixed value of a s (M z ), inspired by the PDG value, 
and others (ABKM, GJR, MSTW) fit a s (Mz) simultaneously with the PDF parameters and use their 
best fit value. All groups bar GJR use values ~ 0.118 - 0.120 at NLO but there is a definite low(0.113)- 
high(0. 1 1 7) split at NNLO. HERAPDF, CT(EQ), NNPDF and MSTW provide PDFs at different a s (M z ) 
values so that the effect of variation of a s (Mz) on cross section predictions can be evaluated. 

MSTW obtain the highest value of a>s(Mz), at both NLO and NNLO, and these high values have 
been atributed to the use of Tevatron jet data in their fits. However, ABM have tried inputting these jet 
data to their fit and have found that this has only a small effect on their extraction of a low value of 
a s (Mz) ll88l . There is also a 'folk-lore' that DIS data prefer lower values of a s (Mz). However both 
MSTW (89) and NNPDF |H have performed DIS only fits in which they find that only the BCDMS 
data prefer low a s {Mz) values. The HERA data actually prefer quite high values as shown in Sec. 14. 1.5l 
The effect of this on the gluon-gluon luminosity may be seen in Fig. |42] by comparing the HERA- 
PDF1.6 curve, with fixed a s (M z ) = 0.1176, to that of the HERAPDF 1.6 free a s (M z ) curve, which 
has a s (Mz) = 0.1202. The larger a s (Mz) value leads to a smaller low-x gluon and a somewhat harder 
high-x gluon such that the gluon-gluon luminosity is then in better agreement with that of MSTW2008, 
which also use a large a s (Mz) value. 

4.2.6 Heavy Quark Schemes 

The ABKM and GJR groups use Fixed-Flavour-Number (FFN) treatments, HERAPDF, CTEQ and 
MSTW use various General-Mass- Variable-Flavour-Number (GMVFN) treatments and NNPDF2.0 (79l 




Fig. 44: Left hand side: data on the direct V^-asymmetry from CDF; right hand side: data on the ZO rapidity spectrun from 
CDF; compared to NLO predictions from CTEQ6.6, MSTW08 and HERAPDF1.5. The blue band indicates the uncertainties 
on the HERAPDF prediction. 

used a Zero-Mass-Variable-Flavour-Number treatment(ZMVFN). These heavy quark schemes are dis- 
cussed in Ref. fl92l . The use of the zero-mass treatment explains why the NNPDF2.0 luminosities lie 
lower than those of CTEQ, MSTW and HERAPDF at low r. This may be seen by comparing the top row 
of Fig. 02] to the middle row where the NNPDF2.1 luminosity, which used a GMVFN, is seen to be in 
much better agreement with the other PDFs. This is because, when charm mass is accounted for, charm 
is suppressed at threshold and the light quark densities must be somewhat larger in order to describe the 
deep inelastic cross-section. However not all GMVFNs are the same. Predictions for F£ differ between 
schemes lf9TTl and the choice of scale within a scheme can also affect predictions. The value of the charm 
and beauty masses also differ between the PDF analyses. HERAPDF, NNPDF and MSTW now provide 
PDFs at different charm and beauty mass values so that the effect of this can be evaluated. In future the 
combined data on F%°, discussed in Sec. 14.1.21 should help to reduce the uncertainty on PDFs coming 
from the choice of scheme and the value of the charm mass. 

The heavy quark mass schemes described in Sec. 14.1.21 all use a charm quark mass parameter 
which should be the pole-mass. However the pole-mass has a strong dependence on the order of the 
perturbative calculation and may best be regarded as a parameter. It may be better to consider the MS 
running-mass. HERA data on F^ c has also been used for a determination of this mass ll93l 

4.3 Comparisons of HERAPDF predictions to Tevatron and LHC data 

Finally we present some representative comparisons of HERAPDF predictions to PDF sensitive data 
from the Tevatron and LHC colliders. Fig. @4] presents comparisons to CDF data on the direct W- 
asymmetry 11941 and ZO rapidity spectrum ||9"5"I . These data are well described by the HERAPDF 1.5 
predictiorQ A fit of the data to the central value of the prediction yields a x 2 of 36 for 28 data points 
for the ZO data and of 41 for 13 data points for the asymmetry data. These descriptions are improved if 
the data is input to the HERAPDF fit, to x 2 /ndp = 26/28 for the ZO data and 21/13 for the asymmetry 
dat£§ The resulting PDFs lie well within the HERAPDF 1.5 error bands. The HERAPDF uncertainty 
bands could be reduced by input of these data. This is a future project beyond the scope of the current 
review. 

Fig. 1451 presents comparisons of HERAPDF 1.0 predictions to DO data on the inclusive jet pro- 
duction ll96l Because of the large correlated systematics of these data it is not possible to assess the 
quality of the description by eye. If these data are input to the HERAPDF1.5 fit a x 2 / n dp = 145/110 
can be obtained. Similary if CDF inclusive jet production data [97] are input to the HERAPDF1.5 
NLO fit a x 2 /ndp = 113/76 is obtained. In both cases the resulting PDFs move to the edge of the 

4 The predictions of the HERAPDF 1.6 and 1.7 PDFs are very similar to that of HERAPDF 1.5 

5 Note that the /ndp for these asymmetry data are as well described by the HERAPDF as they are by other PDFs which 
have used them, e.g. NNPDF. 
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Fig. 45: DO data on inclsuive jet production compared to NLO predictions from HERAPDF1.0 

HERAPDF1.5 error band- tending to favour a harder high-x gluon. For this reason, the HERAPDF1.6 
a s (Mz) = 0.1202 fit, which already has a harder gluon than the 1.5 fit, gives the best description of 
these data out of all the NLO HERAPDF sets. 

The HERAPDF 1 . 5 NNLO PDF fit gives a better description of these data than the NLO PDFs- the 
central PDF of HERAPDF1.5NNLO yields a \ 2 P er data point of x 2 /ndp = 72/76. However this can 
only be approximate since the theoretical description of the jet data itself contains only an approximate 
calculation for the NNLO jet cross-section. 

Fig. |46] presents comparisons of various PDFS, including HERAPDF1.5, to ATLAS data on the 
VF-lepton decay pseudorapidity distributions and the Z0 rapidity distribution, as well as on the VF-lepton 
asymmetry ll98l . Fig. |47]presents comparisons of HERAPDF1.5 predictions to 234pb~ 1 of preliminary 
CMS 2011 data on the W decay lepton asymmetry [100]. These LHC W and Z cross section data are 
well described by the HERAPDF. However, a detailed study by the ATLAS Collaboration [99] using the 
ATLAS W and Z data and the HERA-I combined data has indicated a preference of the ATLAS data 
for unsuppressed strangeness at x ~ 0.01. Further discussion of this is beyond the scope of the present 
review. 

Fig. HH presents comparisons of various PDF predictions, including HERAPDF1.5, to ATLAS 
data on the inclusive jet production 11011 . Fig. [49] presents comparisons of various PDF predictions, 
including HERAPDF1.5, to CMS data on the inclusive jet production [102]. Because of the large 
correlated systematics of these data it is not possible to assess the quality of the description by eye. The 
ATLAS jet data are published with information on these correlations and a \ 2 P er data point of ~ 60/90 
can be obtained for each of the HERAPDFs, and the x 2 for the MSTW, CT and NNPDFs are similar. 
Thus the data are not yet very discriminating, however they indicate a preference for a somewhat less 
hard high-x gluon than the Tevatron jet data. 

5 Summary 

Deep inelastic lepton-hadron scattering data from the HERA collider now dominate the world data on 
deep inelastic scattering since they cover an unprecedented kinematic range. The HI and ZEUS experi- 
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Fig. 46: Comparisons of ATLAS data on W and W decay lepton pseudorapidity spectra, ZO rapidity spectra and W decay 
lepton asymmetry data to NNLO predictions from MSTW08, HERAPDF1.5, ABKM09, JR09. 
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Fig. 47: CMS data on W decay lepton asymmetry compared to NLO predictions from HERAPDF1.5, MSTW08 and CT10W 




Fig. 48: ATLAS data on inclusive jet production in central and forward rapidity regions in ratio to the NLO predictions of CT10 
and compared to NLO predictions from HERAPDF1.5, NNPDF2.1 and MSTW08 
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Fig. 49: CMS data on inclusive jet production in central and forward rapidity regions in ratio to the NLO predictions of CT 10 
and compared to NLO predictions from HERAPDF1.0, ABKM09, MSTW08 and NNPDF2.1 



ments are combining their data in order to provide the most complete and accurate set of deep-inelastic 
data as the legacy of HERA. 

Data on inclusive cross-sections have been combined for the HERA-I phase of running and a pre- 
liminary combination has been made also using the HERA-II data. This latter exersize also includes the 
data run at lower proton beam energies in 2007. Combination of FSf data is underway, and combination 
of Fj* dat and of jet data is foreseen. 

The HERA collaborations have used these combined data to determine partem distribution func- 
tions (PDFs) in the proton. Because the HERA experiments investigated e + p and e~p, charge cur- 
rent(CC) and neutral current (NC) scattering, the inclusive HERA data provide infromation on flavour 
separated up- and down-type quarks and antiquarks and on the gluon- from its role in the scaling vio- 
lations of perturbative quantum-chromo-dynamics. The lower proton beam energy data provide further 
information on the gluon at small x <, 0.01 since they allow a determination of the longitudinal struc- 
ture function. The charm data provide additional information on heavy quark schemes and heavy quark 
mass values. The jet data (separate data from HI and ZEUS at the time of writing) provide additional 
information on the gluon PDF in the x range, 0.01 < x < 0.1 and on a s (Mz)- 

The analysis of these data sets has resulted in the the HERAPDF partem distribution functions. In 
this review we have described and compared these sets with each other and with PDF sets from other 
groups. We have also demonstrated that the HERAPDF sets give successful descriptions of data on W 
and Z production and on jet production from the Tevatron and the LHC. The currently recommended 
version of these PDFs, which are available on LHAPDF, are the HERAPDF1.5 NLO and NNLO sets. 
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